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1 Data extraction: Fully automatic wrapper generati on for s earch engines 
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When a query is submitted to a search engine, the search engine returns a dynamically 
generated result page containing the result records, each of which usually consists of a 
link to and/or snippet of a retrieved Web page. In addition, such a result page often also 
contains information irrelevant to the query, such as information related to the hosting 
site of the search engine and advertisements. In this paper, we present a technique for 
automatically producing wrappers that can be used to extr ... 
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This paper studies the problem of extracting data from a Web page that contains several 
structured data records. The objective is to segment these data records, extract data 
items/fields from them and put the data in a database table. This problem has been 
studied by several researchers. However, existing methods still have some serious 
limitations. The first class of methods is based on machine learning, which requires 
human labeling of many examples from each Web site that one is interested in ... 

Keywords: data extraction, data record extraction, wrapper 



Automatic information extraction from large websites 
Valter Crescenzi, Giansalvatore Mecca 

September 2004 Journal of the ACM (JACM), volume 51 issue 5 



http://portal.acm.org/^ 5/31/2007 



